Class-sensitive Principal Components Analysis
نویسندگان
چکیده
DI MIAO: CLASS-SENSITIVE PRINCIPAL COMPONENTS ANALYSIS (Under the direction of J. S. Marron and Jason P. Fine) Research in a number of fields requires the analysis of complex datasets. Principal Components Analysis (PCA) is a popular exploratory method. However it is driven entirely by variation in the dataset without using any predefined class label information. Linear classifiers make up a family of popular discrimination methods. However, these will face the data piling issue often when the dimension of the dataset gets higher. In this dissertation, we first study the geometric representation of an interesting dataset with strongly auto-regressive errors under the High Dimensional Low Sample Size (HDLSS) setting and understand why the Maximal Data Piling (MDP), proposed by Ahn et al. (2007), is the best in terms of classification compared with several other commonly used linear discrimination methods. Then we introduce the Class-Sensitive Principal Components Analysis (CSPCA), which is a compromise of PCA and MDP, that seeks new direction vectors for better Class-Sensitive visualization. Specifically, this method will be applied to the Thyroid Cancer dataset (see Agrawal et al. (2014)). Additionally, we investigate the asymptotic behavior of the sample and population MDP normal vector and Class-Sensitive Principal Component directions under the HDLSS setting. Moreover, the Multi-class version of CSPCA (MCSP) will be introduced as the last part of this dissertation. iii ACKNOWLEDGMENTS I would like to express my deepest gratitude and appreciation to my advisors, Dr. J. S. Marron and Dr. Jason P. Fine for their generous guidance, support and encouragement. I am especially indebted to Dr. J. S. Marron for his patience, criticism and help during the writing of this dissertation. I would like to deliver thanks to other committee members: Dr. Andrew Nobel, Dr. Yufeng Liu and Dr. Eric Bair for their valuable suggestions and comments. Thanks also go to Dr. Vonn Walter for kindly providing the Thyroid cancer data, which led to interesting applications in this dissertation. I wish to thank my parents for their care from far away and confidence in me. Last but not least, I would like to give my special appreciation to my wife, Mingjun Zhu. Without her sacrificial love, everlasting support and constant prayers, getting to this point would have not been possible. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
منابع مشابه
Identification of Drought Tolerant Oilseed Rape Genotypes using Multivariate Analysis
Extended Abstract Introduction and Objective: Drought stress as one of the most important abiotic stress is the main limiting factor of oilseed rape cultivation in arid and semi-arid climates. Therefore, the identification of drought tolerant genotypes is the essential programs in these regions. One of the appropriate methods to identify drought tolerant genotypes is the use of stress toleranc...
متن کاملSteganalysis Using High-Dimensional Features Derived from Co-occurrence Matrix and Class-Wise Non-Principal Components Analysis (CNPCA)
This paper presents a novel steganalysis scheme with highdimensional feature vectors derived from co-occurrence matrix in either spatial domain or JPEG coefficient domain, which is sensitive to data embedding process. The class-wise non-principal components analysis (CNPCA) is proposed to solve the problem of the classification in the high-dimensional feature vector space. The experimental resu...
متن کاملPatterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis
Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...
متن کاملPatterns Prediction of Chemotherapy Sensitivity in Cancer Cell lines Using FTIR Spectrum, Neural Network and Principal Components Analysis
Drug resistance enables cancer cells to break away from cytotoxic effect of anticancer drugs. Identification of resistant phenotype is very important because it can lead to effective treatment plan. There is an interest in developing classifying models of resistance phenotype based on the multivariate data. We have investigated a vibrational spectroscopic approach in order to characterize a...
متن کاملA new weighting approach to Non-Parametric composite indices compared with principal components analysis
Introduction of Human Development Index (HDI) by UNDP in early 1990 followed a surge in use of non-parametric and parametric indices for measurement and comparison of countries performance in development, globalization, competition, well-being and etc. The HDI is a composite index of three indicators. Its components are to reflect three major dimensions of human development: longevity, knowledg...
متن کامل